0%

(CVPR 2016) Inverting Visual Representations with Convolutional Networks

Dosovitskiy A, Brox T. Inverting Visual Representations with Convolutional Networks[C]// IEEE Conference on Computer Vision & Pattern Recognition. 2016.



1. Overview


In this paper, it proposes a new approach to study image representation by inverting them with an up-convolutional neural network

  • apply to shallow representation. HOG, SIFT, LBP
  • apply to deep representation. DNN


1.1. Conclusion

1.1.1. Representation of AlexNet

  • Features from all layers of the network preserve the precise colors and the rough position of objects in the image
  • In higher layers, almost all information about the input image is contained in the pattern of non-zero activations, not their precise values
  • In the layer FC8, most information about the input image is contained in small probabilities of those classes that are not in top-5 network predictions
  • Local Binary Pattern (LBP). LBP features are not differentiable
  • SIFT. keypoint-based representation

1.2.1. Existing Method Based on Gradient Descent

  • invert a differentiable image representation phi using gradient descent, so it can not be applied to LBP
  • optimize the difference between the feature vectors, not the image reconstruction error
  • involve optimization at test time

1.3. Methods

1.3.1. Loss Function



  • phi. feature vector
  • w. parameters of CNN

1.3.2. Network



1.3.3. HOG and LBP

  • HOG feature. W/8 x H/8 x 31
  • LBP feature. W/16 x H/16 x 58
  • continue process by CNN untill feature size is 64 times smaller than input

1.3.4. Sparse SIFT

  • N keypoints.
  • each keypoint contains: coordinate (x, y), scale s, orientation α, feature descriptor f
  • split image into cells of size d x d, yhis yields W/d x H/d
  • feature. W/d x H/d x (D+5)

1.4. Experiments

1.4.1. Metric



1.4.2. Shallow Representation




1.4.3. Deep Representation